Chapter 2: Descriptive Statistics

Author

Colin Foster

Welcome to the online content for Chapter 2!

I’ll assume that you’ve already read Chapters 1 and 2 of the book and worked through the online content for Chapter 1. If not, please do that first.

As always, if you click the ‘Run Code’ buttons below, you can execute the R code. Remember that sometimes these buttons will say ‘Loading webR…’, and you’ll need to wait until they say ‘Run Code’ before you can press them. And run these boxes in order, without omitting any, otherwise later boxes may give you errors, as they may depend on you having done other things previously, like having read in data sets.

We began this chapter with the same data set that we finished Chapter 1 with. Let’s read in that data set again, but just call the dataframe ‘people’ this time.

We can remind ourselves what the data looks like by plotting it:

It’s a good habit to have a look at a plot before you start doing any calculations, so that you know what you’re dealing with and can spot any potential problems, if things don’t look the way that they should.

Median and mean

Finding the median height or the mean height couldn’t be easier:

We can see from the output that the median is nearly 1 cm more than the mean, and the values match the ones that I stated in the chapter.

Skewed data

I presented a more skewed data set, and we can read that in and call that dataframe ‘skewed’.

Notice that clicking ‘Run Code’ on this box doesn’t seem to do anything, because I didn’t include a line asking R to print out the contents of ‘skewed’. But it is doing something, because it’s downloading the data and storing it in a dataframe called ‘skewed’, ready for us to use later.

Use the code box below to calculate the median and mean of this data set. Remember that you can type or paste in whatever you like in this box, and then click ‘Run Code’ to run it. Press ‘enter’ to get additional lines.

Standard deviation and variance

We also had three other data sets, all of which had the same mean. but with different standard deviations. Let’s read in those data sets and call the dataframes ‘first’, ‘second’ and ‘third’.

We had dataframes called ‘first’ and ‘second’ for the previous chapter. If you had closed your browser since then, they would have been forgotten about anyway. But, even if not, when you read in a data set and assign it to a dataframe, it will overwrite any previous dataframe that you might have had with the same name.

Now we can check that their means are indeed equal.

And we can work out their standard deviations using the sd function.

The numbers are slightly different from those that I presented in the chapter, and we’ll see the reason for this when we get to Chapter 6, but it isn’t anything to worry about for now. There’s no mistake.

Calculating the variance is just as easy as calculating the standard deviation. For example:

If we square the value of the standard deviation, we get exactly the same result:

The two values of 34.74892 match exactly.

The little ^ symbol means ‘to the power of’ and so ^2 means ‘to the power of 2’ or ‘squared’ (i.e. multiplied by itself).

Hopefully you can see that the functionality of R allows us to work out these quantities pretty easily.